Taking MT Evaluation Metrics to Extremes: Beyond Correlation with Human Judgments
نویسندگان
چکیده
منابع مشابه
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
We describe METEOR, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machineproduced translation and human-produced reference translations. Unigrams can be matched based on their surface forms, stemmed forms, and meanings; furthermore, METEOR can be easily extended to include more advanced matching strategies. Once all...
متن کاملMETEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments
Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. It is one of several automatic metrics used in this year’s shared task within the ACL WMT-07 workshop. This paper recaps the technical details underlying the me...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملTaking Coherence to Extremes
In most nuclear magnetic resonance (NMR) experiments, a uniform magnetic field is applied to the sample, and radio frequency pulses are used to manipulate particular nuclear spins. Under ex situ conditions, such as deep in a bore hole, it is difficult to apply high magnetic fields or to do so uniformly, and, thus, approaches in which signals are generated by suddenly reversing the applied magne...
متن کاملBLANC: Learning Evaluation Metrics for MT
We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Flexible, parametrized models can be learned from past data and automatically optimized to correlate well with human judgments for different criteria (e.g. adequacy, fluency) using different correlation measures. Towards this end, we discuss ACS (all common skipngrams), a practical algorithm with trai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computational Linguistics
سال: 2019
ISSN: 0891-2017,1530-9312
DOI: 10.1162/coli_a_00356